Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

نویسندگان

چکیده مقاله:

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. For example, annotated tree bank data has been crucial in syntactic research to test linguistic theories of sentence structure against large quantities of naturally occurring examples. The natural language parser consists of two basic parts, POS tagger and the syntax parser. A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some languages and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as "phrases") and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand-parsed sentences to try to produce the most likely analysis of new sentences. These statistical parsers still make some mistakes, but commonly work rather well. Inaccurate design of context-free grammars and using bad structures such as Chomsky normal form can reduce accuracy of probabilistic context-free grammar parser. Weak independence assumption is one of the problems related to CFG. We have tried to improve this problem with parent and child annotation, which copies the label of a parent node onto the labels of its children, and it can improve the performance of a PCFG. In grammar, a conjunction (conj) is a part of speech that connects words, phrases, or clauses that are called the conjuncts of the conjunctions. In this study, we examined the conjunction phrases in the Persian tree bank. The results of this study show that adding structural dependencies to grammars and modifying the basic rules can remove conjunction ambiguity and increase accuracy of probabilistic context-free grammar parser. When a part-of-speech (PoS) tagger assigns word class labels to tokens, it has to select from a set of possible labels whose size usually ranges from fifty to several hundred labels depending on the language. In this study, we have investigated the effect of fine and coarse grain POS tags and merging non-terminals on Persian PCFG parser.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Context-Free Grammar Induction Based on Structural Zeros

We present a method for induction of concise and accurate probabilistic contextfree grammars for efficient use in early stages of a multi-stage parsing technique. The method is based on the use of statistical tests to determine if a non-terminal combination is unobserved due to sparse data or hard syntactic constraints. Experimental results show that, using this method, high accuracies can be a...

متن کامل

Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering

This paper presents PCFG-BCL, an unsupervised algorithm that learns a probabilistic context-free grammar (PCFG) from positive samples. The algorithm acquires rules of an unknown PCFG through iterative biclustering of bigrams in the training corpus. Our analysis shows that this procedure uses a greedy approach to adding rules such that each set of rules that is added to the grammar results in th...

متن کامل

the effect of learning strategies on the speaking ability of iranian students in the context of language institutes

the effect of learning strategies on the speaking ability of iranian students in the context of language institutes abstract language learning strategies are of the most important factors that help language learners to learn a foreign language and how they can deal with the four language skills specifically speaking skill effectively. acknowledging the great impact of learning strategies...

investigation of effective parameters on the rigidity of light composite diaphragms (psscb) by fem

در این رساله با معرفی سقف های psscb متشکل از ترکیب ورق های فولادی ذوزنقه ای و تخته های سیمانی الیافی به عنوان سقف های پیش ساخته (سازگار با سیستم سازه ای قاب های فولادی سبک) به بررسی پارامترهای موثر بر صلبیت سقف، پرداخته می شود. در تحقیق حاضر ابتدا به مدل سازی دو نمونه سقف آزمایش شده، به روش اجزاء محدود با استفاده از نرم افزار تحلیلی abaqus ver 6.10 پرداخته شده است. نمونه های ساخته شده تحت اعما...

Question Identification Using a Probabilistic Context Free Grammar

This paper shows that using the tree structure generated from a Probabilistic Context Free Grammar parser adds meaningful information to language processing tasks, in particular, question identification. By using a part-of-speech representation of a sentence as a base line, this paper’s results show that adding features derived from the tree output of a Probabilistic Context Free Grammar parser...

متن کامل

A Probabilistic Context-free Grammar for Disambiguation in Morphological Parsing

One of the major problems one is faced with when decomposing words into their constituent parts is ambiguity: the generation of multiple analyses for one input word, many of which are implausible. In order to deal with ambiguity, the MORphological PArser MORPA is provided with a probabilistic context-free grammar (PCFG), i.e. it combines a "conventional" context-free morphological grammar to fi...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 16  شماره 3

صفحات  36- 23

تاریخ انتشار 2019-12

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

کلمات کلیدی برای این مقاله ارائه نشده است

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023